Search CORE

218 research outputs found

Mapping Topics and Topic Bursts in PNAS

Author: Börner Katy
Mane Ketan
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 13/02/2004
Field of study

Scientific research is highly dynamic. New areas of science continually evolve;others gain or lose importance, merge or split. Due to the steady increase in the number of scientific publications it is hard to keep an overview of the structure and dynamic development of one's own field of science, much less all scientific domains. However, knowledge of hot topics, emergent research frontiers, or change of focus in certain areas is a critical component of resource allocation decisions in research labs, governmental institutions, and corporations. This paper demonstrates the utilization of Kleinberg's burst detection algorithm, co-word occurrence analysis, and graph layout techniques to generate maps that support the identification of major research topics and trends. The approach was applied to analyze and map the complete set of papers published in the Proceedings of the National Academy of Sciences (PNAS) in the years 1982-2001. Six domain experts examined and commented on the resulting maps in an attempt to reconstruct the evolution of major research areas covered by PNAS

arXiv.org e-Print Archive

Crossref

PubMed Central

Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale

Author: Börner Katy
Emmons Scott
Gallant Mike
Kobourov Stephen
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 08/07/2016
Field of study

Notions of community quality underlie network clustering. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms -- Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on information recovery metrics. Our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it absolutely superior. Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

The University of Arizona

Long-distance interdisciplinarity leads to higher scientific impact

Author: Börner Katy
Haustein Stefanie
Larivière Vincent
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 30/03/2015
Field of study

Scholarly collaborations across disparate scientific disciplines are challenging. Collaborators are likely to have their offices in another building, attend different conferences, and publish in other venues; they might speak a different scientific language and value an alien scientific culture. This paper presents a detailed analysis of success and failure of interdisciplinary papers—as manifested in the citations they receive. For 9.2 million interdisciplinary research papers published between 2000 and 2012 we show that the majority (69.9%) of co-cited interdisciplinary pairs are “win-win” relationships, i.e., papers that cite them have higher citation impact and there are as few as 3.3% “lose-lose” relationships. Papers citing references from subdisciplines positioned far apart (in the conceptual space of the UCSD map of science) attract the highest relative citation counts. The findings support the assumption that interdisciplinary research is more successful and leads to results greater than the sum of its disciplinary parts

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Dépôt Institutionnel Numérique

FigShare

Mapping the Structure and Evolution of Chemistry Research

Author: Boyack Kevin W.
Börner Katy
Klavans Richard
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

How does our collective scholarly knowledge grow over time? What major areas of science exist and how are they interlinked? Which areas are major knowledge producers; which ones are consumers? Computational scientometrics – the application of bibliometric/scientometric methods to large-scale scholarly datasets – and the communication of results via maps of science might help us answer these questions. This paper represents the results of a prototype study that aims to map the structure and evolution of chemistry research over a 30 year time frame. Information from the combined Science (SCIE) and Social Science (SSCI) Citations Indexes from 2002 was used to generate a disciplinary map of 7,227 journals and 671 journal clusters. Clusters relevant to study the structure and evolution of chemistry were identified using JCR categories and were further clustered into 14 disciplines. The changing scientific composition of these 14 disciplines and their knowledge exchange via citation linkages was computed. Major changes on the dominance, influence, and role of Chemistry, Biology, Biochemistry, and Bioengineering over these 30 years are discussed. The paper concludes with suggestions for future work

IUScholarWorks (University of Indiana)

Digging by Debating: Linking massive datasets to specific arguments

Author: Andrew Ravenscroft
Chris Reed
Colin Allen
Colin Allen
David Bourget
John Lawerence
Katy Börner
Katy Börner
Robert Light
Simon McAlister
Publication venue: 'Modern Language Association'
Publication date: 01/01/2014
Field of study

We will develop and implement a multi-scale workbench, called "InterDebates", with the goal of digging into data provided by hundreds of thousands, eventually millions, of digitized books, bibliographic databases of journal articles, and comprehensive reference works written by experts. Our hypotheses are: that detailed and identifiable arguments drive many aspects of research in the sciences and the humanities; that argumentative structures can be extracted from large datasets using a mixture of automated and social computing techniques; and, that the availability of such analyses will enable innovative interdisciplinary research, and may also play a role in supporting better-informed critical debates among students and the general public. A key challenge tackled by this project is thus to uncover and represent the argumentative structure of digitized documents, allowing users to find and interpret detailed arguments in the broad semantic landscape of books and articles

Humanities Commons

Node, Node-Link, and Node-Link-Group Diagrams: An Evaluation

Author: Bahador Saket
Katy Börner
Paolo Simonetto
Stephen Kobourov
Publication venue
Publication date: 07/04/2014
Field of study

Abstract—Effectively showing the relationships between objects in a dataset is one of the main tasks in information visualization. Typically there is a well-defined notion of distance between pairs of objects, and traditional approaches such as principal component analysis or multi-dimensional scaling are used to place the objects as points in 2D space, so that similar objects are close to each other. In another typical setting, the dataset is visualized as a network graph, where related nodes are connected by links. More recently, datasets are also visualized as maps, where in addition to nodes and links, there is an explicit representation of groups and clusters. We consider these three Techniques, characterized by a progressive increase of the amount of encoded information: node diagrams, node-link diagrams and node-link-group diagrams. We assess these three types of diagrams with a controlled experiment that covers nine different tasks falling broadly in three categories: node-based tasks, network-based tasks and group-based tasks. Our findings indicate that adding links, or links and group representations, does not negatively impact performance (time and accuracy) of node-based tasks. Similarly, adding group representations does not negatively impact the performance of network-based tasks. Node-link-group diagrams outperform the others on group-based tasks. These conclusions contradict results in other studies, in similar but subtly different settings. Taken together, however, such results can have significant implications for the design of standard and domain specific visualizations tools. Index Terms—graphs, networks, maps, scatter plots

arXiv.org e-Print Archive

CiteSeerX

Scholarly Networks on Resilience, Vulnerability and Adaptation within the Human Dimensions of Global Environmental Change

Author: Börner Katy
Janssen Marco A.
Ke Weimao
Schoon Michael L.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

This paper presents the results of a bibliometric analysis of the knowledge domains resilience, vulnerability and adaptation within the research activities on human dimensions of global environmental change. We analyzed how 2,286 publications over the last 30 years are related in terms of co-authorship relations, and citation relations. The number of publications in the three knowledge domains increased rapidly during the last decade. However, the resilience knowledge domain is only weakly connected with the other two domains in terms of co-authorships and citations. The resilience knowledge domain has a background in ecology and mathematics with a focus on theoretical models, while the vulnerability and adaptation knowledge domains have a background in geography, natural hazards research with a focus on case studies and climate change research. There is an increasing number of cross citations and papers classified in multiple knowledge domains. This seems to indicate on a merge of the different knowledge domains

CiteSeerX

IUScholarWorks (University of Indiana)

Taxonomy Visualization in Support of the Semi-Automatic Validation and Optimization of Organizational Schemas

Author: Börner Katy
Hardy Elisha
Herr Bruce
Holloway Todd
Paley W. Bradford
Publication venue: 'Elsevier BV'
Publication date: 01/07/2007
Field of study

Never before in history, mankind had access to and produced so much data, information, knowledge, and expertise as today. To organize, access, and manage these highly valuable assets effectively, we use taxonomies, classification hierarchies, ontologies, and controlled vocabularies among others. We create directory structures for our files. We use organizational hierarchies to structure our work environment. However, the design and continuous update of these organizational schemas that potentially have thousands of class nodes to organize millions of entities is challenging for any human being. The Taxonomy Visualization and Validation (TV) tool introduced in this paper supports the semi-automatic validation and optimization of organizational schemas such as file directories, classification hierarchies, taxonomies, or any other structure imposed on a data set as a means of organization, structuring, and naming. By showing the “goodness of fit” of a schema and the potentially millions of entities it organizes, the TV eases the identification and reclassification of misclassified information entities, the identification of classes that grew over-proportionally, the evaluation of the size and homogeneity of existing classes, the examination of the “well-formedness” of an organizational schema, etc. The TV is exemplarily applied to display the United States Patent and Trademark Office patent classification, which organizes more than three million patents into about 160,000 distinct patent classes. The paper concludes with a discussion and an outlook to future work

IUScholarWorks (University of Indiana)